计算机与现代化 ›› 2012, Vol. 1 ›› Issue (1): 34-36,8.doi: 10.3969/j.issn.1006-2475.2012.01.009

• 人工智能 • 上一篇    下一篇

一种基于贝叶斯分类的XML检索文档相似度算法

韩晓梅,郑洪源,丁秋林   

  1. 南京航空航天大学计算机科学与技术学院,江苏 南京 210016
  • 收稿日期:2011-09-09 修回日期:1900-01-01 出版日期:2012-01-10 发布日期:2012-01-10

An XML Retrieval Document Similarity Algorithm Based on Bayesian Classifier

HAN Xiao-mei, ZHENG Hong-yuan, DING Qiu-lin   

  1. College of Computer Science and Technology, Nanjing University of Aeronautics and Astronautics, Nanjing 210016, China
  • Received:2011-09-09 Revised:1900-01-01 Online:2012-01-10 Published:2012-01-10

摘要: 目前对于查询相似度的计算通常是从比对检索结果与查询式的相似度来考虑。本文提出一种基于贝叶斯分类的算法来计算XML查询结果相似度。在计算出每个检索结果文档与查询式相似度的基础上,使用贝叶斯分类器将XML检索文档分类成相关与不相关两个集合,再由计算相关文档与不相关文档的相似度来决定最终的相似度值。最后,通过实验分析表明,在不影响查全率的前提下,这样得到的相似度计算精度比传统方法高15%左右,有效地提高了检索性能。

关键词: 贝叶斯分类, 查询相似度, XML检索文档, 信息检索

Abstract: At present, the similarity calculation for inquires is usually considered by comparing retrieval results to inquires. This paper proposes an algorithm based on Bayesian classifier to calculate the similarity of XML search results. On the basis of working out similarity of each document and inquire, it divides XML retrieval documents into relevant sets and uncorrelated sets by using Bayesian classifier. Then, final similarity is obtained by calculating the similarity of relevant documents and uncorrelated documents. At last, the experimental analysis shows that the new algorithm improves the retrieval performance effectively about 15 percent higher than traditional method without affecting recall ratio.

Key words: Bayesian classifier, inquire similarity, XML retrieval document, information retrieval

中图分类号: